Sketching for Principal Component Regression
نویسندگان
چکیده
Principal component regression (PCR) is a useful method for regularizing linear regression. Although conceptually simple, straightforward implementations of PCR have high computational costs and so are inappropriate when learning with large scale data. In this paper, we propose efficient algorithms for computing approximate PCR solutions that are, on one hand, high quality approximations to the true PCR solutions (when viewed as minimizer of a constrained optimization problem), and on the other hand entertain rigorous risk bounds (when viewed as statistical estimators). In particular, we propose an input sparsity time algorithms for approximate PCR. We also consider computing an approximate PCR in the streaming model, and kernel PCR. Empirical results demonstrate the excellent performance of our proposed methods.
منابع مشابه
An application of principal component analysis and logistic regression to facilitate production scheduling decision support system: an automotive industry case
Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be ineff...
متن کاملSketching for Kronecker Product Regression and P-splines
TensorSketch is an oblivious linear sketch introduced in (Pagh, 2013) and later used in (Pham and Pagh, 2013) in the context of SVMs for polynomial kernels. It was shown in (Avron et al., 2014) that TensorSketch provides a subspace embedding, and therefore can be used for canonical correlation analysis, low rank approximation, and principal component regression for the polynomial kernel. We tak...
متن کاملA Radhika: Effective Summary for Massive Data Set
The research efforts attempt to investigate size of the data increasing interest in designing the effective algorithm for space and time reduction. Providing high-dimensional technique over large data set is difficult. However, Randomized techniques are used for analyzing the data set where the performance of the data from part of storage in networks needs to be collected and analyzed continuou...
متن کاملSurface EMG-based Sketching Recognition Using Two Analysis Windows and Gene Expression Programming
Sketching is one of the most important processes in the conceptual stage of design. Previous studies have relied largely on the analyses of sketching process and outcomes; whereas surface electromyographic (sEMG) signals associated with sketching have received little attention. In this study, we propose a method in which 11 basic one-stroke sketching shapes are identified from the sEMG signals ...
متن کاملPredicting the Young\'s Modulus and Uniaxial Compressive Strength of a typical limestone using the Principal Component Regression and Particle Swarm Optimization
In geotechnical engineering, rock mechanics and engineering geology, depending on the project design, uniaxial strength and static Youngchr('39')s modulus of rocks are of vital importance. The direct determination of the aforementioned parameters in the laboratory, however, requires intact and high-quality cores and preparation of their specimens have some limitations. Moreover, performing thes...
متن کامل